기본 검색을 넘어선 접근: 의미적 유사성의 한계 해결

유사성 이상으로

이 "80% 문제"기본적인 의미 검색은 간단한 질의에서는 잘 작동하지만, 특이한 경우에 실패할 때 발생합니다. 단순히 유사도 기반으로 검색할 경우 벡터 저장소는 종종 수치적으로 가장 유사한 청크들을 반환하게 됩니다. 그러나 이러한 청크들이 거의 동일하다면, 언어 모델은 중복 정보를 받게 되며 제한된 컨텍스트 창을 낭비하고 보다 광범위한 관점을 놓치게 됩니다.

고급 검색의 핵심 원칙

최대 마진 유사성 (MMR):가장 유사한 항목만 고르는 것이 아니라, 관련성과 다양성을 균형 있게 조절하여 중복을 피합니다. $MMR = \text{argmax}_{d \in R \setminus S} [\lambda \cdot \text{sim}(d, q) - (1 - \lambda) \cdot \max_{s \in S} \text{sim}(d, s)]$
자기 질문 생성 (Self-Querying):언어 모델을 사용해 자연어를 구조화된 메타데이터 필터로 변환합니다 (예: "강의 3" 또는 "출처: PDF"로 필터링).
맥락 압축 (Contextual Compression):검색된 문서를 압축하여 질의와 관련된 "고영양" 스니펫만 추출함으로써 토큰을 절약합니다.

중복의 함정

같은 문장을 세 번 제공한다고 해서 언어 모델이 더 똑똑해지지는 않습니다. 오히려 프롬프트의 비용만 증가할 뿐입니다. 다양한 정보가 "고영양" 컨텍스트의 핵심입니다.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Knowledge Check

You want your system to answer "What did the instructor say about probability in the third lecture?" specifically. Which tool allows the LLM to automatically apply a filter for { "source": "lecture3.pdf" }?

ConversationBufferMemory

Self-Querying Retriever

Contextual Compression

MapReduce Chain

Challenge: The Token Limit Dilemma

Apply advanced retrieval strategies to solve a real-world constraint.

You are building a RAG system for a legal firm. The documents retrieved are 50 pages long, but only 2 sentences per page are actually relevant to the user's specific query. The standard "Stuff" chain is throwing an OutOfTokens error because the context window is overflowing with irrelevant text.

Step 1

Identify the core problem and select the appropriate advanced retrieval tool to solve it without losing specific nuances.

Problem: The context window limit is being exceeded by "low-nutrient" text surrounding the relevant facts.

Tool Selection:ContextualCompressionRetriever

Step 2

What specific component must you use in conjunction with this retriever to "squeeze" the documents?

Solution: Use an LLMChainExtractor as the base for your compressor. This will process the retrieved documents and extract only the snippets relevant to the query, passing a much smaller, highly concentrated context to the final prompt.